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Abstract 



We consider a regular n-ary tree of height h, for which every vertex except the root 
is labelled with an independent and identically distributed continuous random variable. 
Taking motivation from a question in evolutionary biology, we consider the number of 
simple paths from the root to a leaf along vertices with increasing labels. We show that 
if a = n/h is fixed and a > 1/e, the probability there exists such a path converges to 1 
as h — > oo. This complements a previously known result that the probability converges 
to if a < 1/e. 

Oh 

^ ' 1 Introduction 

& . 

Consider a regular n-ary tree of height h, where n = ah. To each vertex except the root attach 
an independent and identically distributed continuous random variable. We are concerned 
with whether there is a simple (that is, non-backtracking) path from the root to a leaf whose 
labels only increase. Nowak and Krug p 7 ] called this accessibility percolation and showed that 
! . P(there exists an increasing path) — 7-Oasn— >• oo if a < 1/e, whereas if a > 1 then there 

exists some p > depending on a such that P(there exists an increasing path) > p. We give 
a complete characterisation in terms of a, showing that there is a phase transition at a = 1/e. 

in '. 

Theorem 1. For a > 1/e, ¥(there exists an increasing path) — > 1 as h — >■ oo. 

m ■ 

In fact we will show that for any a > 1/e, there exist 5 > and ij > such that 
P(there exist at least exp(5h) increasing paths) ^ 1 — exp(— rjh). 

c3 ■ 1.1 Biological motivation 

Consider the following simplified model of evolution in a population. Each genetic type, or 
genotype, in the population has an associated fitness. A particular genotype may give rise 
to multiple new genotypes through mutations, which either replace the original genotype or 
disappear from the population. If the rate of selection is stronger than the rate of mutation, 
only mutations which give rise to a fitter genotype survive. Therefore, the only possible 
evolutionary paths of genotypes are ones with increasing fitness. In the evolutionary biology 
literature, these increasing paths are known as selectively accessible [SI El [2] - 
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To analyse the number of such paths, we also require the relationship between genotype 
and fitness. For this, we use the House of Cards model [U [5], in which every genotype has 
an independent and identically continuously distributed fitness. Since we only care about 
whether the fitnesses along a path are in increasing order, as long as the random variables 
are continuous, the precise distribution is not important. 

The space of genotypes together with their fitnesses form a labelled graph. If we further 
assume that the population initally consists of one single genotype, and that separate muta- 
tions never give rise to the same genotype, then the space of genotypes becomes a rooted tree. 
A selectively accessible or increasing path is then a simple path from the root to a leaf along 
vertices with increasing labels. For the House of Cards model, we may assume that the root 
has the genotype of minimal fitness. This leads us precisely to the accessibility percolation 
model outlined above. 



1.2 Other models 

Our methods could be extended to consider, for example, Galton- Watson trees instead of 
n-ary trees. We might also be able to fine-tune our methods to gain information about the 
finer behaviour near the critical point a = 1/e, but this would be highly technical work and 
seems unlikely to offer further insight into the model. 

Besides trees it is also natural to consider the House of Cards model on the n-dimensional 
hypercube {0, l} n , for which there has been recent progress [2J |3] . A selectively accessible path 
in this setting is a path of minimal length on increasing labels from (0, . . . , 0) to (1, . . . , 1). 
Both papers consider the effect of varying the fitness at the zero vertex on the number of 
accessible paths. Hegarty and Martinsson obtain the threshold for the phase transition of the 
existence of increasing paths as n — > oo. 

Berestycki, Brunet and Shi show that around this threshold, the number of such paths 
converges in distribution to the product of two independent exponential variables. As a first 
step, they obtain results for a particular rooted tree related to the hypercube. 

Hegarty and Martinsson also consider another model for the relationship between genotype 
and fitness, known as the Rough Mount Fuji model in the biology literature [1], where a linear 
drift, depending on the distance to the root, is introduced to the random fitnesses. This model 
on n-ary trees was also considered in [7]. 



1.3 Notation 

Throughout, we assume without loss of generality that the distribution of the labels is U[0, 1], 
and use the following crude double bound for Stirling's approximation valid for all n ^ 1, 

n! 

2 < _. . . < 3. 
yjn{n/e) n 

Let P be the set of simple paths from root to leaf in the tree; then #P = n h . For a path 
u G P, write X(u) = (X(m), . . . ,X(uh)) for the (i.i.d., U[0, 1]) labels on its vertices. For any 
two paths u,v G P, let a(u, v) = maxjfc : = v^}- Clearly X{uj) = X(vj) for all j ^ a(u, v). 

Define 

I = { (xi, . . . , x h ) G [0, l] h : xi < x 2 < ■ ■ ■ < x h } , 

and for e G [0, 1), 

C s = {(x 1 ,...,x h ) G [0,l] h : Xj Vj} 
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Define 

N e = ^2 1 {x(u)einD e }, 

and 

N = Y1 1 {x(u)ei}- 
ueP 

It is clear that N £ ^ N. We will attempt to show that P(iV e ^ 1) is not too small when a > \ 
and a(l — e)e > 1. 



2 Second moment bound 

We break the second moment into a sum over k- forks: 




To this end, for k = 0, . . . , h, let 

N e ( k ) = ^2 1 {X(u),X(v)eItlD E }- 
u,v£P: 
a(u,v)=k 

Then 

N* = j^NKk). 

Clearly N*(h) = N £ , and E[iV|(0)] = E[iV e ] 2 . 

Let U = (Ui, . . . , Uh) and V = (Vi, . . . , Vh) each be a sequence of i.i.d. U[0, 1] random 
variables such that Uj = Vj for all j ^ k and Uj and Vj are independent for j > k. Using the 
fact that a uniform [0, 1] random variable conditioned to have value at least e is a uniform 
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[e, 1] random variable, we have for k = 2, . . . , h — 1, 



i[N^(k)} = n k ■ n(n - 1) • n ^~^ . F(U, Vein D 

" - i ^ n zh-kp(y, V e IDD £ \U,V e C e )F(U, V e C £ 



n 

" ~ ' ^ (ah) 2h - k (l - e) 2h - k F(U, V G / n D ). 



n 



Now, 



F(U, V G / n D ) = f i F(U, V G I n D \U k = x) dx 

< / P(C/i < <7 2 < . . . < C/ fe _i < x)P(x < U k+1 < U k+2 <...<U h f dx 

I k—l 

rl x k-l ^ _ x yh~2k 

~ Jh^ (k- 1)! ' (/i-fc)! 2 

The curve x fc_1 (l - x) 2h ~ 2k is decreasing on x > (A; — l)/(2h — k + 1), so since (fe - l)/(2/i - 
fc + 1) < (k-l)/h, 



x 



k-l 



(l_ x )2fc-2fc ((fc — l)/^)^- 1 ((/l - fc + l)//l) 2fe " 2A; 



(/i-fc)! 2 " (/i-A;)! 2 

Putting these estimates together and then applying Stirling's approximation, we obtain that 
for k = 2, . . . , h — 1, 



< (a(l - e)) 2/l - fc /i 



(fc-1)! (h-k)\ 2 

k-l c 2h-2k+2 



Similarly, 



2(fc-l) 1 /2 A(h-k + l) 
e (a(l - e)e) 2h ~ k h 
8 ' (fc - l)i/2(^ _ + 



E[^(l)]<n^ ( ^_ £ y 

^Iki-^- 1 . 



Thus if ck(1 — e)e > 1, then for some constant c, 



h~ ^ / /i \ \2h—ku 



fc=2 

2 , „/„Yi „\„\2fc 



^ E[N £ ] + E[iV £ ] 2 + c(a(l - e)e) 
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3 First moment bound 



To bound E[iV e ], we shall need the following lemma. 

Lemma 2. Let U\, . . . ,Uj be i.i.d. U[0, 1] random variables. Then 



Ui < ... < Uj, Ui 



i + i 



, • • • , Uj ^ 



j + ij (j + iy. 



Proof. Let 



p = F [Ui < ... < Uj, Ux> 



and for each i = 2, . . . , j, define 

"1 fVi 



i_ / j- 1 

3+1 j+l " i+1 



i-2 



i + i 



. .■ — l — — dvi . . . dvj . 

< (i-l)! (j + l)(i-2)! J 



Note that 



P 



l /-f 



i /i-i 



v 2 



1 dui . . . dvj 



3 + 1 j + l 3+1 

But for each i = 2, . . . ,j — 1, 



l 



j / i-i 



i_ / j- 1 

3+1 3 + 1 " j + l 



Vi+1 



3+1 j + l 3+1 



i-2 



"3 j 

t>2 - — — 

2_ J + l 



dv2 ■ ■ ■ dVj = 12- 



rr — ~, r~7 rr dvi . . . dvj 

i-1! (j + l)(i-2)! J 



1 /-U 



j / 3 — 1 / 8+1 

3+1 3+1 j + l 



Vi+2 



i-l 1 U< +1 



i! (j + l)(i - 1)! 



duj+i . . . dv^ 



l /•« 



Therefore 
P = h = Ij 



3+1 j+l j+l 



1 ^ 



i±l i\ (j + l)(i - 1) 



J- 2 



I fjqj. 

4 (i-l)! U • 1)0' 2)! ' 

j+i 

1 



3 + 1 

- dvi+i . . . dvj = 



J- 1 



j\ (j + l)(j - 1)! 
1 j + 1 - j 



3 

j + l 



j\ + l)(j-l)! + 1)! + 1)' 
as claimed. □ 

Now, 

E[N e ] = n h ¥(U G lnD e ) = n h F(U G /nL» e |C7 G C e )P(£/ G C e ) = (a/i(l - e)) h F(U G JnD )- 
But by Lemma [H 



F(U G I n D ) ^ P(C/i ^ l//i)P I C/ 2 < . . . < C/ft, C/i ^ 

Applying Stirling's approximation once more, we obtain 

(a(l-e)e) h 



i-l 
h 



Vi = 2 h 



1 



h-h\ 



E[iV e ] > 
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4 Proof of Theorem [T] 

By the Paley-Zygmund inequality, 



N > ^1M] > W 



2 J " 4E[iV|] ' 

Prom our bounds on the first and second moments of N £ , we get for some 5 > 0, 

F(N £ ^ exp(6h)) > c'/i~ 3 (1) 

for some constant c' > 0. 

To complete the proof, we will consider the first four levels of the tree separately from the 
rest. We require the following form of Hoeffding's inequality. 

Lemma 3. Suppose that e G (0,1) and ^ a ^ 1 — e/4. If A±, . . . , A r are i.i.d. U[Q, 1] 
random variables, then 

P(#{j : G [a, a + e/4)} < re/8) exp(-e 2 r/16). 

For j = 1,2,3,4, let Mj be the set of vertices v at the jth level of the tree such that 
Vi G [(i — l)e/4, ie/4) for each i = 1, . . . , j. 

Lemma 4. 

P(#M 4 < ne 4 /8 4 ) ^ 4exp(-e 5 n/8192). 

Proof. At level 1, there are n vertices; thus by Lemma [31 the probability that fewer than ne/8 
of them have label in [0,e/4) is at most exp(— e 2 n/16). That is, 

P(#Mi < ne/8) < exp(-e 2 n/16). 

At level 2, given that #Mi > ne/8, there are at least n 2 e/8 vertices whose parent had label 
in [0,e/4), and the probability that fewer than n 2 e 2 /8 2 of these have label in [e/4, e/2) is 
(again by Lemma [3]) at most exp(— e 3 n 2 /2 • 8 2 ). That is, 

P (#M 2 < n 2 e 2 /8 2 | #M X > ne/8) < exp(-e 3 n 2 /2 • 8 2 ). 

Similarly, 

P (#A/ 3 ^ n 3 e 3 /8 3 | #M 2 > n 2 e 2 /8 2 ) < exp(-e 4 n 3 /2 • 8 3 ) 

and 

P (#M 4 < n 4 e 4 /8 4 | #M 3 > n 3 e 3 /8 3 ) < exp(-e 5 n 4 /2 • 8 4 ). 
Summing these estimates gives the result. □ 

To complete the proof of Theorem [TJ note that 

P(iV < exp(5h)) < P(#M 4 < n 4 e 4 /8 4 ) + F{N < exp(5/i), #M 4 > n 4 e 4 /8 4 ). 

Suppose that u £ M 4 , and consider the subtree of height /i — 4 rooted at the vertex n 4 . In 
order that N ^ e Sh , it must hold that there are no more than e Sh paths in this subtree that 
have labels ordered and greater than e. But we know from ([I]), since n/{h — 4) ^ n/h = a, 
that the probability of this event is at most 1 — c'/i~ 3 . Thus, applying also Lemma 01 

F(N ^ exp(6h)) ^ 4exp(-e 5 n/8192) + (1 - c'/i" 3 )™ 4 ^/ 84 < exp(-r//i) 

for some rj > 0, which proves Theorem [TJ 
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